Group 01 project: Analysis of Gene Expression in Parkinsson Disease

Rune Daucke, runda
David Faurdal, s144523
Luisa Weisch, s233028

Introduction

Gene expression data (Affymetrix platform). Collected from the blood of:

  • 20 healthy patients
  • 40 patients diagnosed with sporadic Parkinsons disease who have not received drug treatment.

Box_plot

Cleaning and Filtering Dataset

  • Arranging samples by condition

  • Filter out AFFX related spikes

  • Reshaping into tidy format

  • Mapping probe IDs to genes using Affymetrix database

  • Sorting samples

Augmentation

  • Mean expression of sets of potential biomarkers
    • Neurodegenerative-associated genes
    • Pro-Inflammation genes
# inflammation index
inflam_list <- c("IL1rn", "TNF", "Saa3", "Emr1", "Adam8", "Itgam")

present_genes <- find_genes(inflam_list, all_genes)
df_wide$inflam_mean <- rowMeans(df_wide[present_genes], na.rm = TRUE)

Analysis

  • PCA
    • Normalization
    • PCA calculation
    • Plotting
  • Differential Expression Analysis
    • Caclculation of log2 of the expression
    • Fitting linear model to predict the log2 expression based on the condition
    • adjusts p-values -plotting
  • Differential Expression Analysis of known biomarker genes
    • Visualization of the mean of biomarker specific gene expression
  • KEGG Enrichment Analysis
    • Separates up and down regulated genes
    • Perform KEGG analysis on up or down regulated genes

Results - PCA

PCA

Results - Diff. Expression

df_nested_exp <- df_long |>
  mutate(log2_exp = log2(Expression)) |> # transform data to log2
  group_by(Gene) |>
  nest() |>
  mutate(model_object = map(data, ~lm(log2_exp ~ Condition, 
                                      data = .x))) |>
  mutate(model_object_tidy = map(model_object, ~ tidy(.x, conf.int = TRUE,
                                                      conf.level = 0.95)))

volcano

Results - Diff. Expression of known biomarkers

Neurodegenerative-associated genes

neurodegenerative_density

neurodegenerative_boxplot
Statistical Significant difference
# Calculate p-value
t_test_result <- t.test(ndg_mean ~ Condition, data = df)

p-value = 0.3512

Results - Diff. Expression of known biomarkers

Pro-inflammatory genes

inflammatory_density

inflammatory_boxplot
Statistical significant difference
# Calculate p-value
t_test_result <- t.test(inflam_mean ~ Condition, data = df)

p-value = 0.348

Kegg enrichment

kegg_enrichment

Kegg pathway

kegg_pathway

Conclusion

  • No clustering / Separtion along the PC’s
  • No significant differentially expressed genes
  • What to do next?
    • With patient metadata, clustering might present itself
    • Non-linear tendencies. Performing UMAP or t-SNE could provide clustering
    • ANN, KNN logistic modelling could reveal expression patterns
    • Redo the experiment
    • Perhaps blood transcriptomics on PD patient, isn’t the way to go, though biomarker discovery would be ideal.